In this course we are going to introduce basic analysis for single-cell RNAseq, with a specific focus on the 10X system. The course is divided into multiple sessions.
In this second session, we will introduce Seurat and Bioconductor methodologies for basic scRNA-seq QC, filtering and initial analysis of single samples.
For these sessions we are going to make use of two datasets.
The first set will be from the recent paper Enteroendocrine
cell lineages that differentially control feeding and gut
motility.
This contains scRNA data from either Neurod1 and Neurog3 expressing
enteroendocrine cells.
The second dataset is the classic example from PBMC cells.
The Seurat R package and associated packages offer an R based and well established methodology for the analysis of single cell RNA-seq, ATAC-seq and many other single-cell sequencing methodologies.
Bioconductor has a set of interrelated and highly connected software packages for single cell RNAseq as well as integrating with the rest of the Bioconductor software ecosystem. OSCA book
In this session we will run through basic analyses using both of the methodologies to show both their equvalence in some parts of the analysis as well as their distinct functionalities in others.
In practice, you may select methods from either softwares and convert between the two as required.
For this session we will need the filtered and raw expression data for the Neurod1 datasets.
The filtered matrix can be found here
The raw matrix can be found here here
First we need to load the DropletUtils package to read and handle our Droplet 10X data.
We can then read in the filtered matrix containing data on droplets marked as cells by CellRanger.
We set the row.names argument for easier interpretation.
h5file <- "https://rubioinformatics.s3.amazonaws.com/scRNA_graduate/SRR_NeuroD1/outs/filtered_feature_bc_matrix.h5"
local_h5file <- basename(h5file)
download.file(h5file, local_h5file)
sce.NeuroD1_filtered <- read10xCounts(local_h5file, col.names = TRUE, row.names = "symbol")
class(sce.NeuroD1_filtered)## [1] "SingleCellExperiment"
## attr(,"package")
## [1] "SingleCellExperiment"
The created SingleCellExperiment Object is much like our SummarizedExperiment object
## class: SingleCellExperiment
## dim: 39905 3468
## metadata(1): Samples
## assays(1): counts
## rownames(39905): Gm26206 Xkr4 ... MT-TrnT MT-TrnP
## rowData names(3): ID Symbol Type
## colnames(3468): AAACCCACACAATGTC-1 AAACCCACACCACTGG-1 ...
## TTTGTTGTCTCTTAAC-1 TTTGTTGTCTGGACCG-1
## colData names(2): Sample Barcode
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
The colnames() and rownames() functions can be used to access column names (cell-barcodes) and row names (gene identifiers)
## [1] "AAACCCACACAATGTC-1" "AAACCCACACCACTGG-1"
## [1] "Gm26206" "Xkr4"
The colData() and rowData() functions can be used to access experiment metadata.
## DataFrame with 2 rows and 2 columns
## Sample Barcode
## <character> <character>
## AAACCCACACAATGTC-1 filtered_feature_bc_.. AAACCCACACAATGTC-1
## AAACCCACACCACTGG-1 filtered_feature_bc_.. AAACCCACACCACTGG-1
## DataFrame with 2 rows and 3 columns
## ID Symbol Type
## <character> <character> <character>
## Gm26206 Gm26206 Gm26206 Gene Expression
## Xkr4 Xkr4 Xkr4 Gene Expression
A reducedDim and reducedDimNames slots at present remains unfilled.
The metadata slot contains the Sample names. Here all these cells came from single sample
## character(0)
## $Samples
## [1] "filtered_feature_bc_matrix.h5"
One of the first plots we may want to recreate from the QC is the knee plot.
To do this we will need the unfiltered matrix too.
The unfiltered SingleCellExperiment contains over a million droplets.
h5file <- "https://rubioinformatics.s3.amazonaws.com/scRNA_graduate/SRR_NeuroD1/outs/raw_feature_bc_matrix.h5"
local_h5file <- "NeuroD1_raw_feature_bc_matrix.h5"
download.file(h5file, local_h5file)
sce.NeuroD1_unfiltered <- read10xCounts(local_h5file, col.names = TRUE)
sce.NeuroD1_unfiltered## class: SingleCellExperiment
## dim: 39905 1244317
## metadata(1): Samples
## assays(1): counts
## rownames(39905): Gm26206 Xkr4 ... MT-TrnT MT-TrnP
## rowData names(3): ID Symbol Type
## colnames(1244317): AAACCCAAGAAACACT-1 AAACCCAAGAAACCAT-1 ...
## TTTGTTGTCTTTGCTA-1 TTTGTTGTCTTTGTCG-1
## colData names(2): Sample Barcode
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
We can use the barcodeRanks function to create a table of barcode ranks and total reads per droplet.
## DataFrame with 1244317 rows and 3 columns
## rank total fitted
## <numeric> <integer> <numeric>
## AAACCCAAGAAACACT-1 45958 155 NA
## AAACCCAAGAAACCAT-1 102639 4 NA
## AAACCCAAGAAACCCG-1 629672 1 NA
## AAACCCAAGAAACCTG-1 629672 1 NA
## AAACCCAAGAAAGAAC-1 1096187 0 NA
## ... ... ... ...
## TTTGTTGTCTTTGATC-1 36930.5 167 NA
## TTTGTTGTCTTTGCAT-1 629672.5 1 NA
## TTTGTTGTCTTTGCGC-1 629672.5 1 NA
## TTTGTTGTCTTTGCTA-1 629672.5 1 NA
## TTTGTTGTCTTTGTCG-1 629672.5 1 NA
We then filter for droplets with the same rank for visualisation purposes.
We can now recreate the knee plot from CellRanger.
require(ggplot2)
ggplot(bc_plot, aes(x = rank, y = total, colour = filtered, alpha = 0.001)) + geom_point() +
scale_y_log10() + scale_x_log10() + theme_minimal() + geom_hline(yintercept = metadata(bcrank)$inflection,
colour = "darkgreen", linetype = 2) + geom_hline(yintercept = metadata(bcrank)$knee,
colour = "dodgerblue", linetype = 2)## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
Although CellRanger provides a filtered Droplet matrix, you may wish to provide your own cut-off based on knee plots generated and/or use the emptyDrops method in Droplet utils to identify cell containing droplets.
## DataFrame with 6 rows and 5 columns
## Total LogProb PValue Limited FDR
## <integer> <numeric> <numeric> <logical> <numeric>
## AAACCCAAGAAACACT-1 118 -350.537 0.30007 FALSE 1
## AAACCCAAGAAACCAT-1 2 NA NA NA NA
## AAACCCAAGAAACCCG-1 0 NA NA NA NA
## AAACCCAAGAAACCTG-1 0 NA NA NA NA
## AAACCCAAGAAAGAAC-1 0 NA NA NA NA
## AAACCCAAGAAAGCGA-1 1 NA NA NA NA
From this point we will use the Droplets filtered using Cellranger
## class: SingleCellExperiment
## dim: 39905 3468
## metadata(1): Samples
## assays(1): counts
## rownames(39905): Gm26206 Xkr4 ... MT-TrnT MT-TrnP
## rowData names(3): ID Symbol Type
## colnames(3468): AAACCCACACAATGTC-1 AAACCCACACCACTGG-1 ...
## TTTGTTGTCTCTTAAC-1 TTTGTTGTCTGGACCG-1
## colData names(2): Sample Barcode
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
To assess QC we first want to identify mitochondrial and ribosomal genes.
High mitochondrial gene expression within a cell is often used as a marker of dying cells.
is.mito <- grepl("^MT", rowData(sce.NeuroD1_filtered)$Symbol)
# is.ribo <- grepl('^Rps',rowData(sce.NeuroD1_filtered)$Symbol)
table(is.mito)## is.mito
## FALSE TRUE
## 39868 37
The scuttle package has functions for normalisation, transformation and QC.
Here we us the addPerCellQCMetrics function to update our SingleCellExperiment object with QC information.
library(scuttle)
sce.NeuroD1_filtered <- addPerCellQCMetrics(sce.NeuroD1_filtered, subsets = list(Mito = is.mito))
sce.NeuroD1_filtered## class: SingleCellExperiment
## dim: 39905 3468
## metadata(1): Samples
## assays(1): counts
## rownames(39905): Gm26206 Xkr4 ... MT-TrnT MT-TrnP
## rowData names(3): ID Symbol Type
## colnames(3468): AAACCCACACAATGTC-1 AAACCCACACCACTGG-1 ...
## TTTGTTGTCTCTTAAC-1 TTTGTTGTCTGGACCG-1
## colData names(8): Sample Barcode ... subsets_Mito_percent total
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
The QC data per Droplet has been added to additional columns in the colData slot of our SingleCellExperiment.
## DataFrame with 2 rows and 8 columns
## Sample Barcode sum detected
## <character> <character> <numeric> <integer>
## AAACCCACACAATGTC-1 filtered_feature_bc_.. AAACCCACACAATGTC-1 23320 4776
## AAACCCACACCACTGG-1 filtered_feature_bc_.. AAACCCACACCACTGG-1 1944 176
## subsets_Mito_sum subsets_Mito_detected subsets_Mito_percent
## <numeric> <integer> <numeric>
## AAACCCACACAATGTC-1 3358 19 14.3997
## AAACCCACACCACTGG-1 1577 15 81.1214
## total
## <numeric>
## AAACCCACACAATGTC-1 23320
## AAACCCACACCACTGG-1 1944
We can then plot this using the plotColData function from the Scater package.
First we plot the distritbution of total reads per cell.
Then we plot the distritbution of total detected genes per cell.
Then we plot the distritbution of percent of mitochondrial reads per cell.
It also can be useful to capture the QC columns against each other as scatterplots
library(scater)
p1 <- plotColData(sce.NeuroD1_filtered, x = "sum", y = "detected")
p2 <- plotColData(sce.NeuroD1_filtered, x = "sum", y = "subsets_Mito_percent")
p3 <- plotColData(sce.NeuroD1_filtered, x = "detected", y = "subsets_Mito_percent")
gridExtra::grid.arrange(p1, p2, p3, ncol = 3)At this point we may want to remove low quality data.
Typically droplet filters on mitochondrial read content, total detect genes and total read counts are applied.
qc.high_lib_size <- colData(sce.NeuroD1_filtered)$sum > 125000
qc.min_detected <- colData(sce.NeuroD1_filtered)$detected < 200
qc.mito <- colData(sce.NeuroD1_filtered)$subsets_Mito_percent > 25
discard <- qc.high_lib_size | qc.mito | qc.min_detected
DataFrame(LibSize = sum(qc.high_lib_size), Detected = sum(qc.min_detected), MitoProp = sum(qc.mito),
Total = sum(discard))## DataFrame with 1 row and 4 columns
## LibSize Detected MitoProp Total
## <integer> <integer> <integer> <integer>
## 1 13 473 1767 1792
We will add this back in the colData slot to perform some final QC plots before filtering
colData(sce.NeuroD1_filtered) <- cbind(colData(sce.NeuroD1_filtered), DataFrame(toDiscard = discard))
p1 <- plotColData(sce.NeuroD1_filtered, x = "sum", y = "detected", colour_by = "toDiscard")
p2 <- plotColData(sce.NeuroD1_filtered, x = "sum", y = "subsets_Mito_percent", colour_by = "toDiscard")
p3 <- plotColData(sce.NeuroD1_filtered, x = "detected", y = "subsets_Mito_percent",
colour_by = "toDiscard")
gridExtra::grid.arrange(p1, p2, p3, ncol = 3)At this point we may want to remove the low quality cells from further analysis
sce.NeuroD1_filtered_QCed <- sce.NeuroD1_filtered[, sce.NeuroD1_filtered$toDiscard %in%
"FALSE"]
sce.NeuroD1_filtered_QCed## class: SingleCellExperiment
## dim: 39905 1676
## metadata(1): Samples
## assays(1): counts
## rownames(39905): Gm26206 Xkr4 ... MT-TrnT MT-TrnP
## rowData names(3): ID Symbol Type
## colnames(1676): AAACCCACACAATGTC-1 AAACCCACACGAGAAC-1 ...
## TTTGGAGTCCGTAGGC-1 TTTGTTGGTCTAACTG-1
## colData names(9): Sample Barcode ... total toDiscard
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
As with our RNA-seq and genomics data we want to normalise for total library size and perfrom a log2 transformation.
sce.NeuroD1_filtered_QCed <- logNormCounts(sce.NeuroD1_filtered_QCed)
assayNames(sce.NeuroD1_filtered_QCed)## [1] "counts" "logcounts"
An alternative approach for scRNA is to normalise accounting for potentially different cell-populations.
As we dont know cell populations yet, a quick clustering of data followed by this normalisation using clusters is performed.
library(scran)
clust.sce.NeuroD1_filtered_QCed <- quickCluster(sce.NeuroD1_filtered_QCed)
sce.NeuroD1_filtered_trimmed <- computeSumFactors(sce.NeuroD1_filtered_QCed, cluster = clust.sce.NeuroD1_filtered_QCed)
sce.NeuroD1_filtered_QCed <- logNormCounts(sce.NeuroD1_filtered_QCed)
assayNames(sce.NeuroD1_filtered_QCed)## [1] "counts" "logcounts"
As with RNA-seq we want to account for mean-dispersion relationship.
In Bioconductor we can use the modelGeneVar function to perform this for us.
##
dec.NeuroD1_filtered_QCed <- modelGeneVar(sce.NeuroD1_filtered_QCed)
ggplot(dec.NeuroD1_filtered_QCed, aes(x = mean, y = total)) + geom_point()Having fit the mean-dispersion relationship we can extract the most variable genes for use in PCA dimension reduction.
We can now perform a PCA on our data.
The PCs will be used for clustering and 2D-representation of our data.
set.seed(100) # See below.
sce.NeuroD1_filtered_QCed <- fixedPCA(sce.NeuroD1_filtered_QCed, subset.row = top.NeuroD1_filtered_QCed)
reducedDimNames(sce.NeuroD1_filtered_QCed)## [1] "PCA"
We can produce a quick plot of the PCs to see if there are any biases immediately evident in the PCs.
For further analysis we want to select the PCs which explain most of the data.
To do this we can plot the variance explained by each PC and identify an elbow point.
percent.var <- attr(reducedDim(sce.NeuroD1_filtered_QCed), "percentVar")
plot(percent.var, log = "y", xlab = "PC", ylab = "Variance explained (%)")With our PCs now calculated we can project these onto a 2D representation of the data. Here we will use two popular methods of TSNE and UMAP.
sce.NeuroD1_filtered_QCed <- runTSNE(sce.NeuroD1_filtered_QCed, n_dimred = 30)
sce.NeuroD1_filtered_QCed <- runUMAP(sce.NeuroD1_filtered_QCed, n_dimred = 30)
reducedDimNames(sce.NeuroD1_filtered_QCed)## [1] "PCA" "TSNE" "UMAP"
Witht the TSNE and UMAP we may first want to overlay some features of QC to see if this affects the data still.
For clustering we can use the set of tools available in the bluster package.
Here we run the default graph based methods walktrap and louvain from the igraph package.
Once we have the clustering we will add them to the colData for the object and replot the UMAP
colLabels(sce.NeuroD1_filtered_QCed) <- clust.louvain
colData(sce.NeuroD1_filtered_QCed)$DefaultLabel <- clust.default
plotUMAP(sce.NeuroD1_filtered_QCed, colour_by = "label")We can now plot the mito percent against clusters to see if any clusters are driven by the mito percent of droplets.
plotColData(sce.NeuroD1_filtered_QCed, x = "label", y = "subsets_Mito_percent", colour_by = "label")We can then compare our overlaps between our clusterings in a simple heatmap.
tab <- table(Walktrap = clust.default, Louvain = clust.louvain)
rownames(tab) <- paste("Walktrap", rownames(tab))
colnames(tab) <- paste("louvain", colnames(tab))
library(pheatmap)
pheatmap(log10(tab + 10), color = viridis::viridis(100), cluster_cols = FALSE, cluster_rows = FALSE)The paper also provides their annotation of cells. We can read this in and overlay with our umap.
eec_paper_meta <- read.delim("https://rubioinformatics.s3.amazonaws.com/scRNA_graduate/GSE224223_EEC_metadata.csv",
sep = ";")
nrd1_cells <- as.data.frame(eec_paper_meta[eec_paper_meta$Library == "Nd1", ])
nrd1_cells$X <- gsub("_2$", "", nrd1_cells$X)
newMeta <- merge(colData(sce.NeuroD1_filtered_QCed), nrd1_cells, by.x = "Barcode",
by.y = "X", all.x = TRUE, all.y = FALSE, order = FALSE)
rownames(newMeta) <- newMeta$Barcode
newMeta$All_Cell_Types[is.na(newMeta$All_Cell_Types)] <- "Filtered"
colData(sce.NeuroD1_filtered_QCed) <- newMeta
plotUMAP(sce.NeuroD1_filtered_QCed, colour_by = "All_Cell_Types")We can also see how the annotation overlaps with our clusters.
tab <- table(Walktrap = sce.NeuroD1_filtered_QCed$All_Cell_Types, Louvain = sce.NeuroD1_filtered_QCed$label)
rownames(tab) <- paste("CellType", rownames(tab))
colnames(tab) <- paste("Leiden", colnames(tab))
library(pheatmap)
pheatmap(log10(tab + 10), color = viridis::viridis(100), cluster_cols = FALSE, cluster_rows = FALSE)Now we have clusters defined we can identify genes which are differentially expressed within a cluster compared to other clusters.
markers.sce.NeuroD1 <- scoreMarkers(sce.NeuroD1_filtered_QCed, sce.NeuroD1_filtered_QCed$label)
chosen <- markers.sce.NeuroD1[["4"]]
ordered <- chosen[order(chosen$mean.AUC, decreasing = TRUE), ]
head(ordered[, 1:4]) # showing basic stats only, for brevity.## DataFrame with 6 rows and 4 columns
## self.average other.average self.detected other.detected
## <numeric> <numeric> <numeric> <numeric>
## Pzp 3.39590 0.0791581 0.975207 0.0501323
## Cck 11.32544 4.6047391 1.000000 0.9882847
## Agr2 6.82305 1.5954380 1.000000 0.5219864
## Scg2 6.57950 1.9450183 1.000000 0.5898348
## Upp1 4.17198 1.3609008 0.995868 0.5891026
## Agr3 4.05125 0.9985682 1.000000 0.4145315
We can use Seurat to import our data as before for Bioconductor.
library(Seurat)
h5file <- "https://rubioinformatics.s3.amazonaws.com/scRNA_graduate/SRR_NeuroD1/outs/filtered_feature_bc_matrix.h5"
local_h5file <- basename(h5file)
download.file(h5file, local_h5file)
Nd1T.mat <- Read10X_h5(filename = local_h5file)
Nd1T.obj <- CreateSeuratObject(Nd1T.mat, project = "Nd1")## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')
The Seurat object contains now our information on Neurod1 droplets.
## An object of class Seurat
## 39905 features across 3468 samples within 1 assay
## Active assay: RNA (39905 features, 0 variable features)
## 1 layer present: counts
We can access rownames and column names in a similar manner as before.
## [1] "Gm26206" "Xkr4"
## [1] "AAACCCACACAATGTC-1" "AAACCCACACCACTGG-1"
To access metadata we need to directly access the slot
## orig.ident nCount_RNA nFeature_RNA
## AAACCCACACAATGTC-1 Nd1 23320 4776
## AAACCCACACCACTGG-1 Nd1 1944 176
We can also review the active assay and active cell-barcode identifiers.
## [1] "RNA"
## AAACCCACACAATGTC-1 AAACCCACACCACTGG-1
## Nd1 Nd1
## Levels: Nd1
The get access to the assays we can pull directly from the relevent slots.
## 2 x 2 sparse Matrix of class "dgCMatrix"
## AAACCCACACAATGTC-1 AAACCCACACCACTGG-1
## Gm26206 . .
## Xkr4 . .
As with Biocondcutor we must again identify mitochondrial genes.
## [1] "MT-TrnF" "MT-mt-Rnr1" "MT-TrnV" "MT-mt-Rnr2" "MT-TrnL1"
## [6] "MT-ND1" "MT-TrnI" "MT-TrnQ" "MT-TrnM" "MT-ND2"
We can the use PercentageFeatureSet function to identify percent of mito reads in each cell and the AddMetaData function to add this to the Seurat object.
percent.mt <- PercentageFeatureSet(Nd1T.obj, features = mito.genes)
Nd1T.obj <- AddMetaData(Nd1T.obj, metadata = percent.mt, col.name = "percent.mt")
head(Nd1T.obj@meta.data)## orig.ident nCount_RNA nFeature_RNA percent.mt
## AAACCCACACAATGTC-1 Nd1 23320 4776 14.399657
## AAACCCACACCACTGG-1 Nd1 1944 176 81.121399
## AAACCCACACGAGAAC-1 Nd1 76762 7564 9.710534
## AAACCCAGTGTCCGGT-1 Nd1 17440 4177 30.928899
## AAACGAAAGCCTCGTG-1 Nd1 9512 2187 29.447014
## AAACGAAAGCCTTGAT-1 Nd1 11320 3181 14.187279
We can then review QC metrics across our droplets.
## Warning: Default search for "data" layer in "RNA" assay yielded no results;
## utilizing "counts" layer instead.
As well as plot QC metrics against each other.
plot1 <- FeatureScatter(Nd1T.obj, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(Nd1T.obj, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
plot3 <- FeatureScatter(Nd1T.obj, feature1 = "nFeature_RNA", feature2 = "percent.mt")
plot1 + plot2 + plot3We use the subset function to remove low quality cells from our object.
Nd1T.obj.filt <- subset(Nd1T.obj, subset = nCount_RNA < 125000 & percent.mt < 25 &
nFeature_RNA > 200)
Nd1T.obj.filt## An object of class Seurat
## 39905 features across 1676 samples within 1 assay
## Active assay: RNA (39905 features, 0 variable features)
## 1 layer present: counts
We will now normalise our data by scaling to library size and applying a log2 transformation.
## Normalizing layer: counts
As with Bioc we will now identify variable features.
Here the mean-variance relationship is accounted for by applying a VST transformation.
## Finding variable features for layer counts
## [1] "Olfm4" "Iapp" "H2ac23" "Dmbt1" "Ube2c" "Asic5" "Ifitm3" "Reg3b"
## [9] "Hmgb2" "Reg3g"
We can then plot these features using the VariableFeaturePlot function.
plot1 <- VariableFeaturePlot(Nd1T.obj.filt)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)## When using repel, set xnudge and ynudge to 0 for optimal results
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.
In Bioconductor the scaling of data was done inside the PCA functions.
Here we apply the scaling using the ScaleData function.
all.genes <- rownames(Nd1T.obj.filt)
Nd1T.obj.filt <- ScaleData(Nd1T.obj.filt, features = all.genes)## Centering and scaling data matrix
## $RNA
## Assay (v5) data with 39905 features for 1676 cells
## Top 10 variable features:
## Olfm4, Iapp, H2ac23, Dmbt1, Ube2c, Asic5, Ifitm3, Reg3b, Hmgb2, Reg3g
## Layers:
## counts, data, scale.data
Now we have our scaled data we can apply PCA and identify how many PCs to use in downstream analysis.
## PC_ 1
## Positive: Mir6236, Apoa1, Bnip5, Trpm2, Apoa4, Apob, Ada, Nts, Cyp3a11, Gm53845
## Vgf, Klf4, Col27a1, Gm57596, Rfx6, Prss30, Sst, Pmp22, Gm32584, Rasal2
## Ly6m, Nrp1, Gm50599, Slc38a11, Adgrd1, Ahnak, Slc26a4, Afp, Slc28a2, Apol7a
## Negative: Rps2, Rps8, Rpl12, Rpl27a, Eef1b2, Rplp0, Gvin-ps1, Rpl32, Rps15a, Rpl22
## Cps1, Rpl18a, Rpl13, Rps20, Rps16, Rpl35, Rps7, Rps28, Rpl23, Rps3
## Rpl28, Rps23, Rps4x, Rpl35a, Rps12, Rps24, Rps10, Rps27a, Nme2, Rpl30
## PC_ 2
## Positive: Rbp4, Scgn, Gm11992, Ptma, Bambi, Etv1, Parm1, Cltrn, Isl1, Qpct
## Tmem176a, Kcnk16, Crp, Gm53467, Pzp, Gnai1, Rps3a1, Galnt13, Dner, Rpl6
## Rps24, Rfxap, Cpn1, Tril, Cyp2j5, Rpl12, Smbd1, Kcnj6, Agr2, Tff3
## Negative: Spink1, Clec2h, Mep1b, Ace2, Guca2b, Slc51a, Apoa1, Ces2e, Ggt1, Lgals3
## Alpi, Apoa4, Clca4b, Ifi27l2b, Dpep1, Mogat2, Adh6a, H2-T3, Ace, Treh
## Maf, Xdh, Cndp2, Maoa, Fabp2, Apol10a, H2-T26, Mall, Arg2, Dnase1
## PC_ 3
## Positive: Rbp4, Scgn, Isl1, Gm11992, Gsto1, Cyp2j6, Etv1, Bambi, Gpd2, Ugt2b34
## S100a11, Abcc8, St3gal4, Cltrn, Parm1, Fabp2, Ghrl, Gnai1, Kcnk16, Pzp
## Galnt13, Crp, Higd1a, Kcnj6, Ano6, Acvr1c, Plcxd3, Tril, Slc41a2, Cyp2j5
## Negative: Tpbg, Slc38a11, Lmx1a, Rasal2, Ccnd2, Krt19, Afp, Trpa1, Shisa2, Pla2g7
## Amigo2, Trpm2, Serpinf2, Lypd8l, Sorl1, Slc5a4b, Gm53845, Cwh43, Rpp25, Grik4
## Adh1, 9330154J02Rik, Rab3c, Lonrf1, Fabp3, Apoa5, Nars2, Slc5a9, Ptgr1, Tmem238l
## PC_ 4
## Positive: Top2a, Nusap1, Gvin-ps1, Clca3b, Aurkb, Cenpf, D17H6S56E-5, Ckap2l, Hmmr, Mir6236
## H1f5, Cenpe, Hmgb2, Cdca2, Melk, Esco2, Cd200l2, Stmn1, Bub1b, Prc1
## Uhrf1, Ckap2, Ncapg, H2ac23, Cdca7, Cdca3, Cenpw, Slfn9, Ccnb2, Cenpm
## Negative: Tff3, Agr3, Ttr, Tmem176a, Casp6, Agr2, Uqcr11, Pzp, Uqcrb, Ppia
## Ndufa4, Glod5, D930028M14Rik, Homer2, Tle1, Cyp2j6, Cpn1, Onecut3, Uqcrq, Trf
## Atp5me, Crp, Qpct, Ndufb10, Lypd8, Cox5b, Tmed6, Cyb5a, Rpl19, Atp5mj
## PC_ 5
## Positive: Agr2, Agr3, Mgll, Phgr1, Onecut3, Serpina1e, Tmsb4x, Pzp, Homer2, Klf4
## Tle1, Hspb1, Galnt13, D930028M14Rik, Cit, S100a10, Crp, Cpn1, Gm12511, Ckb
## Serpina1b, Lamb3, Kcnk16, Slc43a2, Gm53467, Glod5, Slc41a2, Cyp2j5, Trpm2, Fgl2
## Negative: Mrln, Scarb1, Phlda1, Rgs4, Rfx6, Rbp2, 2010204K13Rik, Fam167a, Ass1, 4933407L21Rik
## Nherf4, Trpc5, Slfn10-ps, Isl1, Plcxd3, Gamt, Th, Sfrp5, Prkg2, Fmo1
## Gm32255, Plxnc1, Cth, Pls3, Ntrk1, Gsto1, Gm54263, Acvr1c, Alpl, Prps1
We can then create a 2D representation of our data using a UMAP .
## 13:32:03 UMAP embedding parameters a = 0.9922 b = 1.112
## 13:32:03 Read 1676 rows and found 30 numeric columns
## 13:32:03 Using Annoy for neighbor search, n_neighbors = 30
## 13:32:03 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 13:32:03 Writing NN index file to temp file /var/folders/0_/8lnwn5sn7hn9nr8qv2jg48nw0000gn/T//RtmpUTDURW/file16eb7b62b3a5
## 13:32:03 Searching Annoy index using 1 thread, search_k = 3000
## 13:32:03 Annoy recall = 100%
## 13:32:05 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 13:32:06 Initializing from normalized Laplacian + noise (using RSpectra)
## 13:32:06 Commencing optimization for 500 epochs, with 65834 positive edges
## 13:32:09 Optimization finished
And we plot this using the FeaturePlot function overlaying the percent.mt.
We can then perform a louvain clustering using the FindNeighbours and FindClusters functions.
The cluster labels will be stored in active.idents slots.
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 1676
## Number of edges: 54373
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8623
## Number of communities: 12
## Elapsed time: 0 seconds
## AAACCCACACAATGTC-1 AAACCCACACGAGAAC-1 AAACGAAAGCCTTGAT-1 AAACGAACATGTCAGT-1
## 6 5 1 1
## AAACGAAGTAGACGTG-1 AAACGAAGTAGCTGTT-1 AAACGAAGTGTATACC-1 AAACGCTAGAGTTGAT-1
## 6 6 4 2
## AAACGCTCATCGCTCT-1 AAAGAACCAGACTCTA-1 AAAGAACCATATCGGT-1 AAAGAACCATCCTCAC-1
## 1 5 1 6
## AAAGAACGTACCTAAC-1 AAAGAACGTTGGACCC-1 AAAGAACTCGTGCACG-1 AAAGGATCATGTTCGA-1
## 4 2 5 2
## AAAGGTACAACTTGGT-1 AAAGGTACATAGGCGA-1 AAAGTCCCATGGCACC-1 AAAGTCCTCTTCGATT-1
## 2 0 5 0
## AAAGTGAAGATGACAT-1 AAATGGAAGTGTAGTA-1 AAATGGACATGCTGCG-1 AACAAAGGTCCGGTGT-1
## 2 9 3 6
## AACAAAGTCGCAATTG-1 AACAAGAGTGAGTGAC-1 AACAGGGCATGTCAGT-1 AACCAACAGCCTGTGC-1
## 0 2 8 2
## AACCAACAGGTCATAA-1 AACCAACGTTGCGGAA-1 AACCAACTCCTATGGA-1 AACCACAAGGTGGTTG-1
## 0 1 5 3
## AACCACAAGTAACCTC-1 AACCACACATCCAATG-1 AACCACAGTCCAGGTC-1 AACCACAGTCGACGCT-1
## 6 5 0 1
## AACCCAAAGATGAACT-1 AACCCAATCCAGCAAT-1 AACCTGACACTCTCGT-1 AACCTGAGTGCATGAG-1
## 8 3 9 2
## AACGAAAAGGGACTGT-1 AACGAAACAAGAAATC-1 AACGGGACACTCACTC-1 AACGGGATCTCGCTCA-1
## 0 0 0 9
## AACGTCAAGCGATGGT-1 AACGTCAGTGTCCACG-1 AACGTCATCTGGTCAA-1 AACTTCTTCATCACTT-1
## 5 6 8 9
## AAGAACAAGGGCAGAG-1 AAGACTCAGAAATTGC-1 AAGACTCAGCAGCGAT-1 AAGACTCCAACCGTGC-1
## 3 1 4 10
## AAGACTCCAAGTGCAG-1 AAGACTCTCCCGTAAA-1 AAGACTCTCTAAGGAA-1 AAGATAGAGGGCAATC-1
## 1 9 4 0
## AAGCATCCACATGTTG-1 AAGCATCTCAGATTGC-1 AAGCATCTCTCGCAGG-1 AAGCGTTAGGCCGCTT-1
## 2 0 2 3
## AAGCGTTAGGGCAGTT-1 AAGCGTTCATTCTTCA-1 AAGGAATAGCCTAGGA-1 AAGGTAAGTAATCAAG-1
## 2 3 0 1
## AAGTCGTAGGTTCAGG-1 AAGTCGTTCATCCCGT-1 AAGTCGTTCTGTTGGA-1 AAGTGAAAGCCGGATA-1
## 7 0 5 3
## AAGTGAACACCCTGAG-1 AAGTGAAGTTTGGCTA-1 AATCACGAGAGCGACT-1 AATCACGCAAGCGCTC-1
## 4 8 4 11
## AATCACGCAATGAGCG-1 AATCACGCAGACAAGC-1 AATCGACGTGCGGATA-1 AATCGACTCATCGTAG-1
## 0 4 0 3
## AATCGACTCTAAGGAA-1 AATGACCAGGGTGAAA-1 AATGACCCAGAGTAAT-1 AATGACCGTCTGTAGT-1
## 4 1 2 8
## AATGACCGTGGTCCCA-1 AATGCCACATAACTCG-1 AATGCCATCCGGCTTT-1 AATGGAAAGCGTATGG-1
## 9 1 7 1
## AATGGAAGTCAGTTTG-1 AATTCCTTCTCCCATG-1 AATTTCCGTCATATGC-1 ACAAAGAAGCTACTGT-1
## 0 3 3 2
## ACAAAGACAACTCATG-1 ACAAAGACAGATAAAC-1 ACAAAGAGTACTGCGC-1 ACAACCACAACACTAC-1
## 4 5 6 6
## ACAACCACAATTCGTG-1 ACAACCACACCCAACG-1 ACAACCATCAAGCCAT-1 ACAACCATCCTACAAG-1
## 4 7 11 4
## ACAAGCTAGCCAAGCA-1 ACAAGCTCACAGAGAC-1 ACAAGCTCACCTGATA-1 ACAAGCTCAGCGTACC-1
## 0 9 0 0
## ACAAGCTGTTATTCCT-1 ACAAGCTTCGGTTAGT-1 ACACAGTAGCACGTCC-1 ACACCAATCATCCTGC-1
## 2 6 8 1
## ACACCAATCCACCTGT-1 ACACGCGCAATAGGGC-1 ACACTGAGTGCCAAGA-1 ACAGAAACACCGAATT-1
## 5 0 0 5
## ACAGAAACAGAGGACT-1 ACAGCCGAGGCCGCTT-1 ACAGGGAAGAAACTGT-1 ACAGGGAGTTGAATCC-1
## 1 4 10 9
## ACATCCCAGCAAGCCA-1 ACATCCCAGGCTAGCA-1 ACATCCCCATCAGTGT-1 ACATGCACATCTTTCA-1
## 0 1 6 4
## ACATGCAGTACTGCGC-1 ACATGCAGTCCTGGGT-1 ACATGCAGTGCATACT-1 ACATTTCAGTCACTGT-1
## 5 4 1 5
## ACATTTCGTATTTCTC-1 ACATTTCTCCGCACTT-1 ACATTTCTCGTGTGGC-1 ACCAAACAGACCTCAT-1
## 5 2 2 1
## ACCAACAAGCCGCTTG-1 ACCAACAGTCGGCTAC-1 ACCACAAGTCGAAGCA-1 ACCACAAGTTTGGAAA-1
## 6 0 8 2
## ACCACAATCACAAGGG-1 ACCATTTAGACGACTG-1 ACCATTTAGGTACAGC-1 ACCATTTAGTCGTCTA-1
## 7 1 11 2
## ACCATTTGTAGTCCTA-1 ACCATTTGTATGTCCA-1 ACCATTTGTGACTGTT-1 ACCCTCATCCTCGCAT-1
## 7 1 0 2
## ACCTACCCAGCGTTGC-1 ACCTACCTCAAGCTTG-1 ACCTACCTCGACGACC-1 ACCTACCTCGTTTACT-1
## 3 6 3 6
## ACCTGAAAGCAACAGC-1 ACCTGAACAATTGCGT-1 ACCTGAAGTCTGGTTA-1 ACGATCAGTCCGTTTC-1
## 5 2 2 10
## ACGATGTCAAGTTCGT-1 ACGATGTCATGTGCTA-1 ACGATGTTCGAAATCC-1 ACGATGTTCTATCGCC-1
## 0 2 0 5
## ACGCACGCAGGTTACT-1 ACGGAAGCAAGGCCTC-1 ACGGAAGCAGGTATGG-1 ACGGGTCAGGCTGGAT-1
## 6 1 7 3
## ACGGGTCAGTAGCTCT-1 ACGGGTCGTCACTCAA-1 ACGGTCGAGCGCTGAA-1 ACGGTCGGTATCGATC-1
## 7 0 3 9
## ACGGTTAAGCGTGCCT-1 ACGGTTAAGCTCGAAG-1 ACGGTTAGTCGTAATC-1 ACGTACAAGCAGGTCA-1
## 10 4 4 0
## ACGTACATCGTTGTGA-1 ACGTAGTAGGGCGAGA-1 ACGTAGTCATCACAGT-1 ACGTAGTGTCCTATAG-1
## 2 6 0 0
## ACGTCCTAGCATCAGG-1 ACGTCCTGTGGCTACC-1 ACGTCCTGTTAAGGGC-1 ACGTTCCCACTATGTG-1
## 1 4 1 5
## ACTATCTGTGGAGAAA-1 ACTATGGGTTCTCGCT-1 ACTATTCTCAAACCCA-1 ACTCCCATCTAGCATG-1
## 3 0 1 2
## ACTCTCGGTCGTGGTC-1 ACTCTCGGTGGAGAAA-1 ACTGATGCAAATGGAT-1 ACTGCAAAGGACTGGT-1
## 7 2 5 3
## ACTGCAAAGGTATCTC-1 ACTGTCCCATCTGGGC-1 ACTGTCCGTCGCACGT-1 ACTGTGACACCCTAAA-1
## 3 8 0 1
## ACTGTGATCTAACACG-1 ACTTAGGAGCAGCGAT-1 ACTTAGGAGTGAGTTA-1 ACTTAGGCAAGTGGTG-1
## 2 6 2 1
## ACTTAGGGTCCAGCCA-1 ACTTATCGTATCGCTA-1 ACTTATCGTTAGAAGT-1 ACTTATCTCCTGTAGA-1
## 3 5 0 1
## ACTTCCGGTACTGCGC-1 ACTTCCGTCTCCCATG-1 ACTTCGCCAGGTTCAT-1 ACTTCGCGTAGAGACC-1
## 3 3 0 4
## ACTTTGTAGACGCTCC-1 ACTTTGTAGTAAATGC-1 ACTTTGTGTAGAATGT-1 AGAAATGGTCATCGGC-1
## 2 1 1 3
## AGAAATGGTTCAGGTT-1 AGAACAAAGAGTGTGC-1 AGAACAAAGGGCAATC-1 AGAACCTGTAGACGGT-1
## 9 9 1 4
## AGAACCTGTGCCTTTC-1 AGAACCTTCAGTCTTT-1 AGAAGTAAGAAGCGCT-1 AGAAGTACATGGGCAA-1
## 3 3 8 6
## AGACAAACATCGCCTT-1 AGACAAATCCCGAATA-1 AGACACTGTAACCCTA-1 AGACACTTCCTACCAC-1
## 5 2 1 5
## AGACAGGCATCGTGCG-1 AGACCATAGCGACCCT-1 AGACCCGTCGAACACT-1 AGACCCGTCGTTGTTT-1
## 4 0 1 0
## AGAGAATCAACTGGTT-1 AGAGAATTCCCGTGTT-1 AGAGAGCTCAGATGCT-1 AGAGCAGAGGTTCTTG-1
## 1 2 4 0
## AGAGCAGAGTTTGGCT-1 AGAGCAGCAAGGACAC-1 AGAGCAGCAGCACCCA-1 AGAGCAGGTGATGTAA-1
## 1 4 11 8
## AGAGCCCAGCCACAAG-1 AGATAGAAGGATGTTA-1 AGATCCAAGCCTCGTG-1 AGATCCATCAGCTTCC-1
## 0 0 0 1
## AGATCGTCATTGAGCT-1 AGATCGTGTAGCGAGT-1 AGATGAATCAAACGAA-1 AGATGCTAGCATCAAA-1
## 1 1 1 1
## AGATGCTAGGTGCAGT-1 AGATGCTAGTAGAGTT-1 AGATGCTCATGTCAGT-1 AGATGCTGTACTCGCG-1
## 1 6 5 2
## AGCATCAGTCCTGTTC-1 AGCCAATGTCATTCCC-1 AGCCAGCGTTAGGCTT-1 AGCGCCAAGAGGGCGA-1
## 2 6 4 5
## AGCGCCACATAATCGC-1 AGCGCCAGTATAGCTC-1 AGCGTATCAAATTGGA-1 AGCGTATGTTGCCATA-1
## 1 5 2 6
## AGCGTATTCTGGCCGA-1 AGCGTCGCATATCGGT-1 AGCTACAGTCGGATTT-1 AGCTACATCGTCAAAC-1
## 10 0 0 0
## AGCTCAAAGCAGATAT-1 AGCTCAAAGTCTGTAC-1 AGCTCAACACCTTCGT-1 AGCTCAATCGTTCGCT-1
## 1 7 10 6
## AGCTTCCAGAAGCTCG-1 AGCTTCCAGGTGCCTC-1 AGCTTCCTCAAGATAG-1 AGCTTCCTCGTGTTCC-1
## 4 1 2 10
## AGGAAATAGAAGCTGC-1 AGGAAATTCATTCCTA-1 AGGAATATCTGTAACG-1 AGGACGACATGGGAAC-1
## 8 6 0 3
## AGGACTTAGTCCTACA-1 AGGACTTCAGAGGACT-1 AGGAGGTAGGAGCAAA-1 AGGAGGTGTTATGGTC-1
## 0 4 7 0
## AGGATAAGTGTAGCAG-1 AGGATCTAGACGTCCC-1 AGGATCTCAACGATCT-1 AGGCCACAGTTCATCG-1
## 4 1 11 8
## AGGCCACCATCAGTCA-1 AGGCCACCATGGGTCC-1 AGGGCCTAGTACAACA-1 AGGGCCTAGTTGCATC-1
## 6 0 6 3
## AGGGCCTCAGATTAAG-1 AGGGCCTTCCGTTGAA-1 AGGGCCTTCGTAGGGA-1 AGGGCTCAGGACCCAA-1
## 8 2 0 1
## AGGGCTCCAGCTCCTT-1 AGGGCTCGTAGCGCCT-1 AGGGCTCTCTCGACGG-1 AGGGTCCTCGTTACCC-1
## 0 6 0 10
## AGGGTGAAGCATACTC-1 AGGTAGGCACGTAACT-1 AGGTAGGGTACGTACT-1 AGGTAGGGTGAGCCAA-1
## 0 1 1 7
## AGGTCTATCGATTTCT-1 AGGTTACGTACAGGTG-1 AGTAACCCAGACTGCC-1 AGTACCAGTCGCGTTG-1
## 5 0 2 1
## AGTACTGAGGGTTTCT-1 AGTACTGCACTTGACA-1 AGTACTGGTATGGTAA-1 AGTAGTCGTGAGTGAC-1
## 0 8 9 9
## AGTAGTCGTTGTGTTG-1 AGTCAACAGAGTGTGC-1 AGTCAACCACTAACCA-1 AGTCACAAGCGATGAC-1
## 3 5 4 10
## AGTCATGCAGCACAAG-1 AGTCATGTCTCCCAAC-1 AGTCTCCCATGCCGAC-1 AGTCTCCCATGGACAG-1
## 3 3 5 9
## AGTGATCAGGACGCTA-1 AGTTAGCGTTAGGACG-1 AGTTCCCGTTACCGTA-1 ATACCGATCCGTAGTA-1
## 3 4 1 5
## ATACCTTGTCGATTCA-1 ATACCTTTCTAGAGCT-1 ATACCTTTCTCCCATG-1 ATAGACCCACTCGATA-1
## 0 0 3 4
## ATAGACCGTCATCGCG-1 ATAGACCTCATGCCAA-1 ATAGAGACAAGAATGT-1 ATAGGCTGTAACGCGA-1
## 0 0 0 4
## ATAGGCTTCTAGCCAA-1 ATATCCTCAGGCGATA-1 ATCACAGCACCGTGCA-1 ATCACAGCACGATAGG-1
## 1 0 3 11
## ATCACGAAGGTACAAT-1 ATCACGATCGCTAGCG-1 ATCACTTAGGGTTTCT-1 ATCACTTTCCTAAGTG-1
## 5 0 0 1
## ATCAGGTAGGGTGGGA-1 ATCATTCAGCAGGTCA-1 ATCATTCCATAGACTC-1 ATCATTCGTGACTAAA-1
## 10 2 3 0
## ATCATTCGTTCGTTCC-1 ATCCACCAGAGTTCGG-1 ATCCACCCAGTGGCTC-1 ATCCACCGTCATCCCT-1
## 7 4 5 3
## ATCCACCGTGTGACCC-1 ATCCATTCAAGCGATG-1 ATCCATTGTGCCCGTA-1 ATCCATTTCGGCATAT-1
## 3 11 4 2
## ATCCCTGAGGTAAGGA-1 ATCCCTGCAGCACCCA-1 ATCCCTGCATGTGACT-1 ATCCCTGTCTGAGATC-1
## 6 6 1 5
## ATCCGTCCAAGCTGTT-1 ATCCGTCGTCTGTAGT-1 ATCCGTCGTCTTGGTA-1 ATCCGTCTCTGAATGC-1
## 7 9 3 3
## ATCCTATAGTCGGGAT-1 ATCCTATCAAGTGATA-1 ATCCTATTCGTGCATA-1 ATCGATGAGTTAACAG-1
## 0 2 1 0
## ATCGATGCACCAGACC-1 ATCGATGGTGGGAGAG-1 ATCGCCTCAATGTCAC-1 ATCGGATGTAGTATAG-1
## 0 1 0 0
## ATCGTCCCAAGACCTT-1 ATCGTCCCACCGTACG-1 ATCTCTAGTTTCGTAG-1 ATCTTCACACAATCTG-1
## 2 2 3 3
## ATCTTCACACGCCACA-1 ATCTTCACAGACCCGT-1 ATGAAAGCATTGCTTT-1 ATGAAAGTCAAAGAAC-1
## 4 0 0 3
## ATGAAAGTCAGGGTAG-1 ATGAAAGTCATCTACT-1 ATGAAAGTCTCTTAAC-1 ATGACCAAGCGCGTTC-1
## 2 0 0 2
## ATGACCAAGGAACTCG-1 ATGACCACAAGTGCAG-1 ATGACCACATTCACAG-1 ATGACCAGTCGTTGCG-1
## 5 0 7 2
## ATGACCATCCCGAACG-1 ATGAGGGGTACCTAAC-1 ATGAGTCCAAAGACGC-1 ATGAGTCTCTGTACAG-1
## 6 6 3 0
## ATGATCGTCAAATGAG-1 ATGCCTCAGGTGCTGA-1 ATGCCTCTCACGGGAA-1 ATGCGATTCGTTCTCG-1
## 1 8 0 11
## ATGGATCGTGTAGGAC-1 ATGGATCTCCATCCGT-1 ATGGGAGGTGGTGATG-1 ATGGGTTAGCAACCAG-1
## 0 1 3 0
## ATGGTTGGTCAACACT-1 ATGGTTGGTTATCTGG-1 ATGGTTGTCAAGCTGT-1 ATGGTTGTCCCGATCT-1
## 1 5 5 2
## ATGGTTGTCCGCTAGG-1 ATGTCTTCACGGTCTG-1 ATGTCTTCACTGATTG-1 ATGTCTTCATCGCTAA-1
## 2 4 6 9
## ATGTCTTGTCATTCCC-1 ATTACTCTCCGGCTTT-1 ATTATCCAGACATAGT-1 ATTATCCGTATCGATC-1
## 7 0 0 0
## ATTATCCTCCCAAGTA-1 ATTCATCAGGGTAGCT-1 ATTCCATTCCGTCCTA-1 ATTCCTATCTAGGCCG-1
## 0 3 2 0
## ATTCGTTCAGAGTAAT-1 ATTCGTTTCGAGAATA-1 ATTCTACAGCTCGAAG-1 ATTCTACTCACCGGGT-1
## 1 0 5 1
## ATTCTTGAGGCAGGGA-1 ATTCTTGTCAACCTCC-1 ATTGGGTTCGTAGGGA-1 ATTGTTCCATCGAAGG-1
## 8 6 1 9
## ATTGTTCGTCCACGCA-1 ATTGTTCTCACCGACG-1 ATTGTTCTCTCCATAT-1 ATTTACCAGGGCGAGA-1
## 5 8 3 8
## ATTTACCAGTACTGTC-1 ATTTACCGTTGCATTG-1 ATTTCACAGTCCCAGC-1 ATTTCACGTTGGTAGG-1
## 2 0 1 5
## ATTTCTGAGAGCACTG-1 CAAAGAAGTGCCTACG-1 CAAAGAATCGTAGTGT-1 CAACAACAGTAGAGTT-1
## 8 8 5 5
## CAACAACTCTTACCAT-1 CAACAGTTCTCCCATG-1 CAACCAAAGATCCAAA-1 CAACCAAAGTGCACTT-1
## 8 3 6 7
## CAACCTCAGTGCAACG-1 CAACCTCGTATTTCGG-1 CAACCTCGTTGTCAGT-1 CAACGATAGATTGTGA-1
## 7 11 4 1
## CAACGATCAACGTATC-1 CAACGATTCCACTGGG-1 CAACGATTCCCGTGAG-1 CAACGATTCCGAAGGA-1
## 8 5 1 2
## CAACGGCGTAACACCT-1 CAAGACTCACTGAGTT-1 CAAGAGGAGCTACAAA-1 CAAGAGGCAACACGAG-1
## 3 0 5 0
## CAAGAGGGTGGGACAT-1 CAAGCTAAGACGGAAA-1 CAAGCTAGTCTCCTGT-1 CAAGCTAGTGCATGTT-1
## 4 1 8 2
## CAAGCTATCCTTCGAC-1 CAAGGGAAGAGTTGCG-1 CAAGGGACACCGGAAA-1 CAAGGGACACGTGAGA-1
## 0 7 0 5
## CAAGGGAGTAAGAACT-1 CAAGGGATCTTCGTAT-1 CAATACGTCACACCCT-1 CAATCGACAATTGAAG-1
## 1 4 1 3
## CAATGACAGGTTGCCC-1 CAATGACCAAATAAGC-1 CAATGACCATATTCGG-1 CACAACACATGACTCA-1
## 8 3 4 1
## CACACAACACCCTATC-1 CACAGATTCCTAACAG-1 CACAGGCTCTGTCCGT-1 CACATGAAGAGGATGA-1
## 2 4 2 4
## CACATGAGTGCGTTTA-1 CACCAAAGTAATCAGA-1 CACCGTTCATTAGGCT-1 CACCGTTGTCCAGGTC-1
## 1 6 3 1
## CACCGTTTCACTCACC-1 CACCGTTTCTAGACCA-1 CACTGGGAGCTACTGT-1 CACTGGGGTAATGTGA-1
## 3 7 3 4
## CACTGTCCATTAGGAA-1 CACTTCGGTAAGTTGA-1 CACTTCGTCAGCATTG-1 CAGATACAGGAACGAA-1
## 8 2 3 2
## CAGATACGTAGGAGTC-1 CAGATACGTGCACATT-1 CAGATCAAGTTCACTG-1 CAGATCACAAGGTACG-1
## 2 0 1 6
## CAGATTGAGCGCCATC-1 CAGCAATGTGAGCAGT-1 CAGCCAGAGCTGGAGT-1 CAGCCAGCACATCCCT-1
## 0 9 2 4
## CAGCCAGGTTCAAGGG-1 CAGCCAGTCAGTGGGA-1 CAGCGTGAGCTTGTGT-1 CAGCGTGCACCCATAA-1
## 5 7 10 4
## CAGCGTGGTTGCATGT-1 CAGCGTGGTTGCTCGG-1 CAGGCCACACGAAGAC-1 CAGGCCACAGGTCAAG-1
## 0 11 1 2
## CAGGGCTAGAGAGCAA-1 CAGGGCTTCATTCACT-1 CAGGTATCATGGCCCA-1 CAGTGCGCACATACGT-1
## 4 3 4 0
## CAGTGCGGTCATTCCC-1 CAGTTAGGTAATCAGA-1 CAGTTCCCACCTGAAT-1 CAGTTCCGTGCATGTT-1
## 6 2 0 3
## CAGTTCCGTGTTTACG-1 CATAAGCGTATGGAAT-1 CATACAGGTTGGAGAC-1 CATACTTCAAGAGATT-1
## 0 9 4 0
## CATACTTGTCACCGCA-1 CATACTTGTCCCTGTT-1 CATAGACAGGGTGGGA-1 CATAGACGTAGATCGG-1
## 10 3 1 6
## CATAGACGTCAATGGG-1 CATCAAGCATAGATGA-1 CATCAAGTCGCTATTT-1 CATCCACAGAAGCGCT-1
## 0 8 0 0
## CATCCACAGCGTGCTC-1 CATCCGTGTAGGAGGG-1 CATCGCTCAGGGATAC-1 CATCGGGAGAGCAAGA-1
## 1 5 10 9
## CATCGGGTCTCGCGTT-1 CATCGTCAGAAGCGCT-1 CATCGTCTCCGGCAGT-1 CATGAGTAGTTCGGTT-1
## 0 3 1 7
## CATGAGTTCTTCCGTG-1 CATGCAATCTCGCCTA-1 CATGCCTCACTTGAGT-1 CATGCCTCAGTGGTGA-1
## 4 4 1 3
## CATGCCTTCATCACTT-1 CATGCGGTCTAGCAAC-1 CATGCTCGTTGTACGT-1 CATGCTCTCTGAGCAT-1
## 5 4 7 0
## CATGGATAGAGTCTGG-1 CATGGATAGGTAGACC-1 CATGGATCATCTTCGC-1 CATGGATGTAGATTGA-1
## 0 0 0 1
## CATGGATTCCAGTTCC-1 CATGGTACAAATCAAG-1 CATTCTAGTAGAATAC-1 CATTGAGTCCCGAGTG-1
## 2 7 1 6
## CATTGCCCAGATGCGA-1 CATTGTTAGCTTTGTG-1 CATTGTTGTTGCTAGT-1 CATTTCAGTAATGATG-1
## 0 2 5 4
## CATTTCAGTCGAAGCA-1 CCAAGCGGTAGCCCTG-1 CCAATGAAGATGCGAC-1 CCAATGAAGGAACTAT-1
## 4 3 5 10
## CCAATGATCTGCTGAA-1 CCAATTTGTGTCGCTG-1 CCACAAACATGCGGTC-1 CCACACTGTATCGAGG-1
## 1 7 7 1
## CCACCATAGAATTTGG-1 CCACCATAGTATAACG-1 CCACGAGTCGCCTATC-1 CCACGAGTCTTAGCCC-1
## 6 0 2 5
## CCACGTTTCCATCAGA-1 CCACGTTTCTATCCAT-1 CCACTTGAGGGTACAC-1 CCACTTGTCACTAGCA-1
## 0 0 1 1
## CCACTTGTCTGAGATC-1 CCATCACCAACCGATT-1 CCCAACTTCCCGAAAT-1 CCCGAAGAGACAACAT-1
## 1 1 0 2
## CCCGAAGAGTGAGTGC-1 CCCGAAGTCCGATGCG-1 CCCGGAACAACTAGAA-1 CCCGGAACATGTGGTT-1
## 4 10 3 1
## CCCTAACAGGATCACG-1 CCCTAACCACAGTCAT-1 CCCTAACCATTAAAGG-1 CCCTAACGTGTTCAGT-1
## 2 1 9 7
## CCCTAACTCATCTACT-1 CCCTAACTCTGCGGAC-1 CCCTCAAGTGACACGA-1 CCCTCAATCCCATAAG-1
## 7 6 8 2
## CCCTCAATCCCTTCCC-1 CCCTCTCGTGAGATAT-1 CCCTTAGGTCTCTCTG-1 CCCTTAGTCATGGCCG-1
## 5 10 3 0
## CCCTTAGTCGTTCATT-1 CCGAACGGTAAGCTCT-1 CCGATCTCACGCGTCA-1 CCGATCTTCAGAGTTC-1
## 7 0 3 4
## CCGATGGGTTGGGCCT-1 CCGATGGTCCGGGACT-1 CCGCAAGCAGACCTGC-1 CCGCAAGGTGTTGATC-1
## 5 4 3 3
## CCGGACAAGGTTTGAA-1 CCGGGTAAGTGTAGAT-1 CCGGTGAAGGCCACCT-1 CCGGTGACACAACCGC-1
## 9 9 3 7
## CCGGTGAGTCAAAGAT-1 CCGTGAGAGATTACCC-1 CCGTGAGCAAGCCTGC-1 CCGTGAGCATCCGCGA-1
## 2 7 1 2
## CCGTTCATCAGATTGC-1 CCTAACCTCACTACGA-1 CCTAAGAAGAGGTCGT-1 CCTAAGACAGTAGGAC-1
## 2 2 8 1
## CCTAAGACAGTGTGGA-1 CCTAAGACATGGCCCA-1 CCTAAGAGTATTTCCT-1 CCTACGTAGCGAGGAG-1
## 2 2 9 0
## CCTACGTAGTGGCCTC-1 CCTACGTCACGCGTCA-1 CCTACGTTCAAACTGC-1 CCTATCGAGTATTGCC-1
## 2 2 0 1
## CCTATCGCAGTTGAAA-1 CCTCAACAGATTGGGC-1 CCTCAACTCGCACGAC-1 CCTCAACTCTCTGCTG-1
## 7 0 5 0
## CCTCAGTCATCCTATT-1 CCTCAGTCATCGGATT-1 CCTCAGTCATGGGAAC-1 CCTCATGCACTTCATT-1
## 2 5 4 1
## CCTCATGTCGTTCCCA-1 CCTCCAACAAGCTCTA-1 CCTCCAACACTGCACG-1 CCTCCAAGTGCCTGCA-1
## 5 11 0 0
## CCTCCTCTCACTTGTT-1 CCTCCTCTCATCCTAT-1 CCTCTAGAGAGCCATG-1 CCTCTAGCATGCAGGA-1
## 0 10 0 8
## CCTCTCCGTGACCGTC-1 CCTCTCCTCGTGGGAA-1 CCTGCATAGCTAAACA-1 CCTGCATCAGTCGTTA-1
## 1 5 1 1
## CCTGCATGTCAGACGA-1 CCTGCATTCGATTCCC-1 CCTGCATTCTCCAATT-1 CCTTCAGCAGCTACTA-1
## 10 3 2 6
## CCTTGTGAGCGTGCTC-1 CCTTGTGCAAGCAGGT-1 CCTTGTGCAGATCATC-1 CCTTGTGGTCGCATGC-1
## 9 5 0 1
## CCTTGTGTCACCTGGG-1 CCTTGTGTCCACGTCT-1 CCTTGTGTCGTTCCTG-1 CCTTTGGAGAGCAGCT-1
## 3 6 0 0
## CCTTTGGAGGGCTTCC-1 CCTTTGGGTCACAGTT-1 CCTTTGGTCAGCTTCC-1 CGAAGGAAGGATATAC-1
## 7 2 0 1
## CGAAGGATCGAATGCT-1 CGAATTGAGTGCTCAT-1 CGAATTGCATTCTCTA-1 CGAGAAGGTGACAGCA-1
## 6 7 6 0
## CGAGGAAAGGCAGGTT-1 CGAGGAAGTAGTGCGA-1 CGAGGCTGTCCGTACG-1 CGAGTGCAGCGCTTCG-1
## 0 0 2 3
## CGAGTGCCAACTAGAA-1 CGAGTGCGTTCGGTTA-1 CGAGTTAAGCGACCCT-1 CGAGTTAAGCGCACAA-1
## 2 2 9 2
## CGAGTTAGTGTGTCCG-1 CGAGTTATCATGCCCT-1 CGAGTTATCTGTCGTC-1 CGATCGGAGATGCGAC-1
## 1 1 6 8
## CGATCGGTCCCTCAAC-1 CGATCGGTCGGACTTA-1 CGATGCGTCTTAGCAG-1 CGATGGCAGGTCTACT-1
## 3 0 2 3
## CGATGGCCACTGGCGT-1 CGCAGGTCACTGGCGT-1 CGCATAAGTTCTCCCA-1 CGCATAATCTCCCATG-1
## 3 1 6 3
## CGCATGGAGATGTTAG-1 CGCATGGAGCCTAGGA-1 CGCCAGATCGGTGAAG-1 CGCCATTGTTGGATCT-1
## 5 8 0 4
## CGGAACCCAGGCTTGC-1 CGGAACCGTGAGTAAT-1 CGGAACCTCTTAATCC-1 CGGAATTTCTCCTACG-1
## 5 6 2 3
## CGGAGAAAGCGCCTTG-1 CGGAGAACAGGACATG-1 CGGGACTTCAAATAGG-1 CGGGCATCACCACATA-1
## 2 0 0 0
## CGGGTCATCGTTCTCG-1 CGGGTGTAGTTGAATG-1 CGGTCAGAGTCGGCAA-1 CGGTCAGCAGAGCGTA-1
## 4 0 4 10
## CGGTCAGTCATGGCCG-1 CGTAAGTTCAATCTTC-1 CGTAATGGTTAGGCCC-1 CGTAATGTCGGCATCG-1
## 10 6 4 3
## CGTAGTACAGTTAGGG-1 CGTCAAACAGTCAACT-1 CGTCAAAGTCATAGTC-1 CGTCCATAGAACCGCA-1
## 0 3 1 10
## CGTCCATAGGCATGCA-1 CGTCCATAGTCAGCCC-1 CGTCCATTCACTACGA-1 CGTGAATGTCCATACA-1
## 3 2 1 5
## CGTGCTTCACTGGAAG-1 CGTGCTTGTGTCCACG-1 CGTGCTTGTTGGCTAT-1 CGTGTCTCAAAGCTAA-1
## 0 1 8 0
## CGTGTCTCAGGTGGAT-1 CGTGTCTCATCTCAAG-1 CGTGTCTGTCACAATC-1 CGTGTCTTCGGCTTGG-1
## 0 6 9 7
## CGTTAGATCGAGTCTA-1 CGTTCTGCAGAGTTGG-1 CGTTCTGGTCAGTCCG-1 CGTTCTGGTGGCTCTG-1
## 5 8 5 1
## CGTTGGGAGCCGATCC-1 CGTTGGGGTGACCTGC-1 CGTTGGGTCAGATTGC-1 CTAACCCAGTCATTGC-1
## 2 9 3 6
## CTAACCCCACGTTGGC-1 CTAACTTAGGGAACAA-1 CTAACTTAGGTATAGT-1 CTAAGTGAGGTAGCAC-1
## 1 3 4 1
## CTAAGTGCAGCGCTTG-1 CTAAGTGGTCATGCAT-1 CTAAGTGGTCCTGTTC-1 CTAAGTGGTGGCAACA-1
## 5 9 3 0
## CTAAGTGTCGCCAACG-1 CTAAGTGTCGTTGTTT-1 CTACAGAAGTAAGACT-1 CTACAGAGTCTACACA-1
## 8 0 7 3
## CTACAGATCAGCACCG-1 CTACATTTCTCTAAGG-1 CTACCTGAGTTGGGAC-1 CTACGGGCAACCGTGC-1
## 0 2 1 2
## CTACGGGCATGGACAG-1 CTAGACACATCCTTCG-1 CTAGGTAAGTGAGTTA-1 CTAGGTAGTGATTAGA-1
## 1 2 2 1
## CTATAGGCAGCACAGA-1 CTATAGGGTCAGCGTC-1 CTATAGGGTCCCACGA-1 CTATAGGGTTTCGACA-1
## 0 1 0 3
## CTATCCGTCGTCTAAG-1 CTCAAGATCAAGCTGT-1 CTCAATTAGATAGCAT-1 CTCAATTGTAGACTGG-1
## 4 0 5 0
## CTCAATTTCGTTCTGC-1 CTCACTGCAGTCTTCC-1 CTCACTGTCATCGCTC-1 CTCACTGTCTGACGCG-1
## 0 0 0 3
## CTCAGAAGTAATGCTC-1 CTCAGAATCCGTTGGG-1 CTCAGGGAGATAGCAT-1 CTCAGGGGTACCCAGC-1
## 7 3 3 5
## CTCAGTCCAACCGCTG-1 CTCATCGCAACATACC-1 CTCATCGCAGCAGTGA-1 CTCATCGGTCGTGCCA-1
## 1 5 0 0
## CTCATCGGTTGAATCC-1 CTCATGCGTTAATCGC-1 CTCATTAAGTTTGTCG-1 CTCATTACAACCACGC-1
## 0 0 0 7
## CTCATTAGTTCATCTT-1 CTCATTATCAGTAGGG-1 CTCATTATCGGCATTA-1 CTCCAACCACACCTAA-1
## 1 3 0 1
## CTCCACATCTCCCATG-1 CTCCATGAGCGTGAAC-1 CTCCCAAGTACTAACC-1 CTCCCAATCTTCGGTC-1
## 3 3 0 3
## CTCCCTCAGATCCGAG-1 CTCCCTCCAACTTCTT-1 CTCCCTCCATCTCAAG-1 CTCCGATAGTACCATC-1
## 0 1 2 7
## CTCCGATGTAGGGAGG-1 CTCCTCCAGGTTAAAC-1 CTCCTCCCATGACGAG-1 CTCCTTTAGCACCGTC-1
## 11 0 1 0
## CTCCTTTCACGGCGTT-1 CTCCTTTGTTTACCTT-1 CTCCTTTTCCTACGGG-1 CTCGAGGCACGTGTGC-1
## 5 2 7 0
## CTCGAGGGTCGCAGTC-1 CTCTCAGAGTTTGCTG-1 CTCTCGACAGAACTTC-1 CTCTCGAGTCGTGATT-1
## 1 0 4 0
## CTCTCGAGTGACTAAA-1 CTCTGGTAGCCAGAGT-1 CTCTGGTCAGAAACCG-1 CTCTGGTGTGCGTTTA-1
## 2 1 3 2
## CTCTGGTTCGATGCAT-1 CTGAATGCACGACCTG-1 CTGAATGCACGCAAAG-1 CTGAGGCGTGTTCGTA-1
## 0 0 0 0
## CTGCAGGAGTCTGCAT-1 CTGCAGGGTATGTCAC-1 CTGCATCAGACATAGT-1 CTGCATCTCCCTCTTT-1
## 3 2 4 3
## CTGCCATCACGTCATA-1 CTGCCATGTGTCTTAG-1 CTGCCTAAGCTCATAC-1 CTGCCTAAGTTGAAAC-1
## 3 5 7 2
## CTGCCTACAGGTCAAG-1 CTGCCTATCTCATAGG-1 CTGCGAGAGGGCCTCT-1 CTGCGAGAGGTCACTT-1
## 1 4 11 0
## CTGCGAGAGTGGAAAG-1 CTGGACGAGTCTCGTA-1 CTGGCAGGTAGCACGA-1 CTGGCAGTCTCCCATG-1
## 1 3 0 3
## CTGGTCTTCCCAGGCA-1 CTGTACCAGCGCGTTC-1 CTGTACCGTTTCTTAC-1 CTGTAGAAGCACTCCG-1
## 9 0 0 8
## CTGTAGAAGGTAGACC-1 CTGTAGAAGGTGCATG-1 CTGTATTCATTCACCC-1 CTGTCGTCAAGAGTAT-1
## 5 0 1 8
## CTGTCGTCATATGCGT-1 CTGTCGTGTTGCGGCT-1 CTGTCGTGTTTGACAC-1 CTGTGAAAGGAGTATT-1
## 0 0 0 11
## CTGTGAATCTGGCTGG-1 CTGTGGGTCACCGCTT-1 CTTACCGCAAGACCGA-1 CTTACCGCATACATCG-1
## 11 7 6 4
## CTTAGGAAGTGGCCTC-1 CTTAGGAGTATACGGG-1 CTTAGGAGTCATCCGG-1 CTTAGGATCCCATAAG-1
## 10 4 9 5
## CTTCCGAAGCGGACAT-1 CTTCCTTAGCCAAGGT-1 CTTCCTTTCAGCATTG-1 CTTCGGTAGACCCGCT-1
## 5 3 3 1
## CTTCGGTCAGCTGTCG-1 CTTCGGTTCCCTTGTG-1 CTTCTAAAGGCTGTAG-1 CTTCTAATCACCCTGT-1
## 0 1 1 1
## CTTCTAATCGACTCCT-1 CTTCTCTAGTAGCATA-1 CTTCTCTGTAAGTAGT-1 CTTCTCTTCATGGCCG-1
## 5 7 3 4
## CTTGATTAGAGGCGTT-1 CTTGATTGTGCGCTCA-1 CTTTCAAGTCTGTCAA-1 GAAACCTAGAGAGAAC-1
## 0 0 2 10
## GAAACCTGTGTTGACT-1 GAAACCTTCAGGCGAA-1 GAAACCTTCGCAATTG-1 GAAATGAGTGCCTGAC-1
## 6 2 0 4
## GAAATGAGTTAGGCCC-1 GAACACTAGTGCCTCG-1 GAACACTCAGCTGTTA-1 GAACGTTAGGGATCGT-1
## 6 0 2 4
## GAACGTTGTCGCATTA-1 GAACTGTAGACTCCGC-1 GAACTGTCAATAGTGA-1 GAACTGTGTGTGAATA-1
## 11 2 2 0
## GAAGAATTCGGATAAA-1 GAAGCCCGTGACTAAA-1 GAAGCCCTCCGAACGC-1 GAAGCGAAGGTTCTTG-1
## 4 1 4 1
## GAAGCGAGTACTGCGC-1 GAAGGACCATGTGGTT-1 GAAGTAACATTACGGT-1 GAAGTAAGTCTGTAGT-1
## 3 0 1 3
## GAATAGAGTGGAATGC-1 GAATCACCAACAGCTT-1 GAATCACCATGACTCA-1 GAATCGTAGAACTGAT-1
## 0 7 0 4
## GAATCGTTCCGTCACT-1 GACACGCCAGCATGCC-1 GACACGCCATTGACAC-1 GACAGCCGTGTTCATG-1
## 0 3 0 1
## GACATCATCCAACCGG-1 GACCAATAGTAAACAC-1 GACCAATCACCCTAAA-1 GACCAATTCGCACTCT-1
## 3 0 3 0
## GACCCAGAGTTCCGTA-1 GACCCAGTCGTACACA-1 GACCGTGGTAGATTGA-1 GACCGTGGTAGCTGTT-1
## 0 5 2 2
## GACCGTGTCAGTCATG-1 GACCGTGTCTACCTTA-1 GACGCTGGTGGTTCTA-1 GACGCTGGTTCGTGCG-1
## 4 3 0 4
## GACGTTAAGCTGAGCA-1 GACTATGCAACAGCTT-1 GACTATGCACAACGAG-1 GACTATGCACAGCCTG-1
## 8 3 1 0
## GACTCTCGTAGAGTTA-1 GAGAAATCAGCGATTT-1 GAGAAATTCCAAACCA-1 GAGACCCGTACAGTTC-1
## 0 10 0 1
## GAGAGGTCAGGCGTTC-1 GAGATGGTCGCCGATG-1 GAGCCTGTCGCCTATC-1 GAGCTGCCACTAGGCC-1
## 1 2 1 8
## GAGCTGCTCCGTGCGA-1 GAGGCAACAAGTGGCA-1 GAGGCAACATCCTATT-1 GAGGCCTTCCGCAACG-1
## 3 0 1 3
## GAGGGATAGGTGATAT-1 GAGGGATCACTGGATT-1 GAGGGATGTCTAACTG-1 GAGGGTACATGGATCT-1
## 1 9 2 6
## GAGGGTAGTGTTGACT-1 GAGGGTAGTTAGGGAC-1 GAGTCTACAGCGCTTG-1 GAGTGAGGTGGCACTC-1
## 3 5 2 5
## GAGTGAGTCTTACGGA-1 GAGTGTTTCATTGCCC-1 GAGTGTTTCATTGCGA-1 GAGTTACTCATACGAC-1
## 1 0 9 9
## GAGTTACTCGTAGGAG-1 GAGTTGTAGGCTCACC-1 GAGTTGTCAAGTATAG-1 GAGTTGTCACCCAACG-1
## 8 2 3 5
## GAGTTGTGTCTGCATA-1 GAGTTGTTCGCCGAAC-1 GAGTTTGCACAATGAA-1 GAGTTTGGTAGTCGGA-1
## 6 2 0 0
## GAGTTTGTCGACATCA-1 GATAGAAGTAGAGGAA-1 GATCACATCCTCAGGG-1 GATCACATCTGAACGT-1
## 8 9 9 10
## GATCAGTCATCACGGC-1 GATCCCTAGTCACTCA-1 GATCCCTCAATCAAGA-1 GATCGTAAGTAGCCAG-1
## 4 3 2 3
## GATCGTATCTAATTCC-1 GATGAGGCACAAGGTG-1 GATGAGGGTAATCAGA-1 GATGAGGTCATGTCTT-1
## 4 1 7 4
## GATGAGGTCTGAGTCA-1 GATGATCCATATCTCT-1 GATGATCGTTAGGAGC-1 GATGATCTCTCCCATG-1
## 2 2 2 3
## GATGCTAAGAAGCGAA-1 GATGCTAGTCTCGGGT-1 GATGCTATCGCTGTCT-1 GATGGAGCACAACGCC-1
## 8 5 4 9
## GATGGAGTCGTTAGAC-1 GATGTTGCACGATTCA-1 GATTCGACAGGCGAAT-1 GATTCGATCCCTGGTT-1
## 7 0 0 4
## GATTCGATCTTACCAT-1 GATTCTTGTTAGCTAC-1 GATTCTTTCCGAGTGC-1 GATTCTTTCCTTCACG-1
## 3 7 5 1
## GATTGGTGTAGTTCCA-1 GATTGGTTCTCCCTAG-1 GATTTCTAGTCGAAAT-1 GATTTCTCAATAGAGT-1
## 1 11 5 7
## GATTTCTTCCCATGGG-1 GCAACATAGGTAAAGG-1 GCAACATCAAATAGCA-1 GCACATATCCAGTTCC-1
## 3 0 2 0
## GCACGGTCACGAAAGC-1 GCACGGTCAGGAGACT-1 GCACGTGCAAGTGCTT-1 GCACGTGTCTCTGACC-1
## 1 4 0 8
## GCACGTGTCTGTCTCG-1 GCACGTGTCTTGGATG-1 GCACTAAGTCGTCTCT-1 GCACTAAGTGTGCTTA-1
## 2 3 5 5
## GCACTAAGTTCGGTAT-1 GCAGCCAAGTCGAAAT-1 GCAGCCACAGCCCAGT-1 GCAGCTGCAGCGAACA-1
## 0 8 0 1
## GCAGCTGGTGCCCAGT-1 GCAGGCTGTCGGCCTA-1 GCAGGCTGTTAGAAAC-1 GCAGTTATCTGCACCT-1
## 1 3 3 5
## GCATCGGAGCTAGAAT-1 GCATCGGCAGTAGGAC-1 GCATCGGGTAACATGA-1 GCATCGGTCAGGAAAT-1
## 10 5 3 8
## GCATCGGTCTGCGAGC-1 GCATCTCCATCTATCT-1 GCATCTCGTCTACAAC-1 GCATGATCATTACTCT-1
## 0 2 1 1
## GCATGATGTACATACC-1 GCATTAGCAAGACGAC-1 GCATTAGGTCTGGTTA-1 GCATTAGTCGGCATAT-1
## 0 6 5 0
## GCCAACGTCTCTCGCA-1 GCCAGGTGTTTCCAAG-1 GCCAGTGAGGCACTAG-1 GCCAGTGAGTGCCTCG-1
## 10 4 11 3
## GCCATTCAGATGACCG-1 GCCATTCCAACAGAGC-1 GCCATTCCATTCAGGT-1 GCCATTCTCAGGCGAA-1
## 4 8 1 0
## GCCCAGAAGCGGGTAT-1 GCCCAGAGTGCTTATG-1 GCCCAGATCAGGGTAG-1 GCCCAGATCCTGTAGA-1
## 0 2 1 1
## GCCCGAACATACCACA-1 GCCCGAAGTTGTGCCG-1 GCCCGAATCTATTTCG-1 GCCGTGATCCACGTCT-1
## 9 1 2 6
## GCCTGTTCACGGAAGT-1 GCGAGAATCCCTTGTG-1 GCGGAAACATGGGTCC-1 GCGGAAATCAGACCTA-1
## 4 1 7 2
## GCGGATCTCTCCCATG-1 GCGGATCTCTGCACCT-1 GCGTGCAAGCCAAGGT-1 GCGTGCATCAACGCTA-1
## 3 6 5 0
## GCGTTTCGTGGTAATA-1 GCTACAAAGAACTGAT-1 GCTACCTCAAGTATAG-1 GCTACCTCATCGATCA-1
## 8 6 0 2
## GCTACCTTCCCTCATG-1 GCTCAAATCACACCCT-1 GCTCAAATCAGTGTTG-1 GCTCAAATCCGTGGTG-1
## 6 3 3 1
## GCTGAATAGATTAGTG-1 GCTGAATGTAGTTACC-1 GCTGCAGCATCATCCC-1 GCTGCAGGTCTTGTCC-1
## 3 10 6 5
## GCTGCAGTCCCAGGAC-1 GCTGGGTGTCAACGCC-1 GCTGGGTTCAGTGTGT-1 GCTTCACAGCTTAAGA-1
## 3 0 4 5
## GCTTCACCATTGTGCA-1 GCTTGGGGTATAGGGC-1 GCTTGGGGTCTAACGT-1 GCTTGGGGTGATGAAT-1
## 8 3 0 2
## [ reached getOption("max.print") -- omitted 676 entries ]
## Levels: 0 1 2 3 4 5 6 7 8 9 10 11
We can quickly compare the clusters from Bioconductor and Seurat.
tab <- table(Bioc = clustersAll$Bioc, Seurat = clustersAll$Seurat)
rownames(tab) <- paste("Bioc", rownames(tab))
colnames(tab) <- paste("Seurat", colnames(tab))
library(pheatmap)
pheatmap(log10(tab + 10), color = viridis::viridis(100), cluster_cols = FALSE, cluster_rows = FALSE)The FindAllMarkers function allows us to identify markers across all clusters.
## Calculating cluster 0
## Calculating cluster 1
## Calculating cluster 2
## Calculating cluster 3
## Calculating cluster 4
## Calculating cluster 5
## Calculating cluster 6
## Calculating cluster 7
## Calculating cluster 8
## Calculating cluster 9
## Calculating cluster 10
## Calculating cluster 11
## p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
## Pzp 1.715897e-267 5.646907 0.968 0.055 6.847289e-263 1 Pzp
## Habp2 3.308956e-228 4.367410 0.927 0.074 1.320439e-223 1 Habp2
## Crp 5.925219e-227 4.383579 0.943 0.079 2.364459e-222 1 Crp
## Adamts5 1.379875e-223 4.735019 0.785 0.027 5.506392e-219 1 Adamts5
## Cyp2j5 6.698187e-220 4.499152 0.822 0.040 2.672912e-215 1 Cyp2j5
## Ugt8a 9.152623e-191 4.181251 0.664 0.019 3.652354e-186 1 Ugt8a